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ABSTRACT 

We present a new Monte Carlo Markov Chain algorithm for CMB analysis in the low signal-to- 
noise regime. This method builds on and complements the previously described CMB Gibbs sampler, 
and effectively solves the low signal-to-noise inefficiency problem of the direct Gibbs sampler. The 
new algorithm is a simple Metropolis-Hastings sampler with a general proposal rule for the power 
spectrum, C^, followed by a particular deterministic rescaling operation of the sky signal, s. The 
acceptance probability for this joint move depends on the sky map only through the difference of 
X^'s between the original and proposed sky sample, which is close to unity in the low signal-to-noise 
regime. The algorithm is completed by alternating this move with a standard Gibbs move. Together, 
these two proposals constitute a computationally efficient algorithm for mapping out the full joint 
CMB posterior, both in the high and low signal-to-noise regimes. 
Subject headings: cosmic microwave background — cosmology: observations — methods: numerical 



1. INTRODUCTION 

Since the detection of anisotropy in the Cosmic Mi- 
crowave Background (CMB; Smoot et al. 1992), there 
has been an emphasis on likelihood or Bayesian methods 
for the inference of cosmological parameters and their 
error bars, or more generally, their confidence intervals. 
CMB analysis is most suitably addressed in a Bayesian, 
as opposed to frequentist, framework, simply because the 
observed microwave sky is interpreted as a single realiza- 
tion of a spatial random process. 

Early measurements of the CMB were limited to sig- 
nal to noise ratios of order unity at relatively low 
angular scales, where direct evaluation of the likeli- 
hood for the power spectrum or cosmological parame- 
ters is possible. However, the 0{N^) scaling of com- 
putational expense with pixel number N prohibits di- 
rect likelihood evaluation for current and future CMB 
observations. Motivated by the scientific potential of 
CMB data with increasingly high spatial resolution, 
yet beset with systematics including partial sky cov- 
erage and foregrounds, an iterative method of sam- 
pling from the Bayes posterior, using a special case 
of Markov Chain Monte Carlo ( MCMC) known as 
Gibb s sampling, was introduced by ([Jewell et al.l 1200^ 
|2004|) . The method was later i ndependently discov - 
ered and applied to COBE data (jWandelt et al.ll2004f ). 
numerically extende d to high-resolution on the sphere 
(jEriksenet al.ll2004D . applied to analysis of the WMAP 
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I2007D data (lO'Dwver et al.ll2004l:lEriksen et aLlEoOTallbl )? 
as well as generaliz ed to include inference of foreground 
model parameters (jEriksen et al.l 2008a, b). 

While Gibbs sampling provably converges to the 
Bayes posterior over the entire range of angular scales 
probed by the data, the run-time required to generate 
enough independent samples at the low signal-to-noise, 
small angular scale regime was found to be prohibitive 
(jEri ksen et al.l[20M ). The reason for this is that typical 
variations in the power spectrum from one sample to the 
next are determined by cosmic variance alone, whereas 
the posterior itself is given by both cosmic variance and 
noise. This results in a long correlation length in the se- 
quence of spectra in the low signal to noise regime, thus 
requiring a very long run time to generate a sufficient 
number of independent samples. 

In this paper we generalize the original Gibbs sam- 
pling algorithm to include a new type of MCMC step 
alternating with standard Gibbs sampling, which solves 
this problem of slow probabilistic convergence in the low 
signal to noise regime. This method therefore makes pos- 
sible an exact Bayesian approach to CMB analysis over 
the entire range of angular scales probed by current and 
future experiments. 

The paper is organized as follows. We first review the 
CMB Gibbs sampler, and describe the associated nu- 
merical difficulties in analysis at small angular scales. 
We then introduce the new MCMC step to the Markov 
chain, designed specifically to allow large variations in 
the high-£ CMB spectrum, precisely where the signal to 
noise is < 1. We derive the required Metropolis-Hastings 
acceptance probability correctness in Appendix \^ and 
numerically demonstrate the method in Section IH for 
both temperature and polarization. Finally, we summa- 
rize and conclude in Section [S] 



2. REVIEW OF GIBBS SAMPLING 
2.1. The Joint Posterior 



We begin by assuming that the observed data may be 
modelled by a signal and a noise term, 



d = As 



(1) 



where d is a vector containing the data (at every point- 
ing of the detectors) , the matrix A involves both point- 
ing and beam convolution (and where for this paper 
we will assume symmetric beams and neglect the de- 
tails of this operation), and n is additive noise (here 
in the pixel domain). We assume both the CMB sig- 
nal and noise to be Gaussian random fields with van- 
ishing mean and covariance matrices S and N, respec- 
tively. In harmonic space, where s = J^gm'^i-mYim, 
the CMB temperature covariance matrix is given by 
Gem,i'm' = {a*im'^i'm') = CiSu'Smm', Ci being the an- 
gular power spectrum. A generalization to polarization 
merely requires the replacement of the signal matrix di- 
agonal elements with 3x3 matrices of the form 
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For the discussion in this section, wc focus on the tem- 
perature case, but note that the generaliza tion to polar- 
ization is straightforward and discussed bv lLarson et al.l 

Hop). 

Given these asumptions, our goal is to quantify what 
has been learned about the underlying power spectrum of 
the CMB given the data, or how well the data constrain 
the cosmological parameters. One proceeds then, in a 
Bayesian framework, by writing down the posterior given 
the data, 

P{Ci\d)o,C{d\Ce)P{Ci). (3) 

Here £{d\Ci) is the likelihood and P{Ci) is a prior on 
Ci. 

In order to derive the functional form of the likelihood, 
one imagines randomly choosing any relevant model [here 
a power spectrum drawn from P{Ci)], and asks what se- 
quence of effects needs to be modeled in order to simulate 
the data. Here, simulation is understood as conditioning 
on the chosen model, and leads to a joint density 

P{d,s,Ce) = P{d,s\Ci)P{Ci) 

= P{d\s)P{s\Ce)P{Ce) (4) 

where the last line follows directly from our data model 
through the assumption of additive noise. Specifically, 
the factors in the above are 

- 2 logP(s|C£) ^ s*C-is - log |C| 
-21ogP(d|s) = -(d-s)*N-i(d-s) -log|N| (5) 

which follow from the assumption that both the signal 
and noise are independent Gaussian processes. 

The idea of a "simulation chain" provides a conceptu- 
ally clear approach to constructing a joint density, from 
which we immediately have the Bayesian posterior 



P(C,|d)= fdsPiCi,s\d) 



(6) 



The relevance of the above for this paper lies in relating 
what we refer to as the joint posterior, P(Ce, s\d), and 
the more familiar likelihood cld\Ci) oc P(Cf |d)/P(C^), 



Although we can analytically compute the integral of 
the joint posterior over the signal for the Gaussian sig- 
nal and noise processes considered here, and therefore 
simply write down the functional form of the likelihood, 
it is too expensive to evaluate it for any specified Ci 
given high-resolution data. Furthermore, for more com- 
plicated data models (i.e. including foreground model 
uncertainties) we will not be able to perform the inte- 
grals over the additional degrees of freedom. Both sit- 
uations then instead motivate sampling from the joint 
posterior, and thereby generating samples from P(C£ |d) 
without ever evaluating P{Ci\d). We now discuss the 
original Gibbs sampling appr oach proposed and im ple- 
mented byLJ cwcU et al. (2004l). IWandelt et al.l(|2004 l and 
lEriksen et all (12004), and then introduce a new MCMC 
step which directly addresses the previously reported 
slow pr obabilistic convergen ce in the low signal to noise 
regime (jEriksen et al.ll2004D . 

2.2. The CMB Cibbs sampler 

As stated above, our goal is to sample from the joint 
posterior, 

-21ogP(s,Q|d)=x'(d,s)+s*S-is+log|S|+logP(C,). 

(^) 
For notational convenience, we have here dropped con- 
stant factors of 27r, and also defined 

x2(s,d) = (d-s)*N-i(d-s). (8) 

One approach to sample from this posterior is to use an 
algorithm known as Gibbs sampling, where we can alter- 
nately sample from the respective conditional densities, 

,1+1 
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Here ^- indicates sampling from the distribution on the 
right-hand side. After some burn-in period, during which 
all samples must be discarded, the joint samples (s\ C^) 
will be drawn from the desired density. Thus, the prob- 
lem is reduced to that of sampling from the two condi- 
tional densities P{s\Ci,d) and P(C^|s,d). 

We now describe the sampling algorithms for each 
of these two conditional distributions, starting with 
P{Ci\s,d). First, note that P{Ci\s,d) = P(Q|s) which 
follows directly from the construction of the joint density 
of "everything" above. This is also intuitively easy to un- 
derstand since if we already know the CMB sky signal, 
the data themselves tell us nothing new about the CMB 
power spectrum. Next, since the sky is assumed to be 
Gaussian and isotropic, the distribution reads 



P(Q|s)oc P(C,)- 
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which, when interpreted as a function of C^, is known as 
the inverse Gamma distribution. In this expression, ui = 
W+i X^m k^niP denotes the observed power spectrum of 
s. Fortunately, there exists a simple te xtbook sampling 
algor ithm for this distribution (e.g., iGupta &: Nagad 
|200C1| ). and we refer the interested reader to the previous 
papers for details. For an alternative, and more flexible, 
sampling algorithm, see lEriksen fc WehusI (|2008f ). 

In order to describe the sky signal sampling step, we 
first define the mean-field map (or Wiener filtered data) 



to be s = (S^^ + N"^)"^N"M, and note that the con- 
ditional sky signal density given the data and Ci can be 
written as 

P(s|Q, d) oc e-^(-^)'(s-^+N-^)(s-s)_ (12) 

Thus, P{s\Ci,A) is a Gaussian distribution with mean 
equals to s and a covariance matrix equals to (S^^ + 
N"i)-i. 

Sampling from this Gaussian distribution is straight- 
forward, but computationally somewhat cumbersome. 
First, draw two random white noise maps ljq and wi with 
zero mean and unit variance. Then solve the equation 
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N"M 



S 2tJo+N '^ui. 



(13) 



for s. Since the white noise maps have zero mean, one 
immediately sees that (s) = s, while a few more calcula- 
tions show that (ss*) = (S~^ -I- N~i)~^ 

The problematic part about this sampling step is the 
solution of the linear system in Equation 1131 Since this 
a ~ 10^ X 10^ system for current CMB data sets, it can- 
not be solved by brute force. Instead, one must use a 
method called Conjugate Gradients (GG), which only 
requires multiplication of the coefficient matrix on the 
left-hand side, not inversion. For details on these com- 
putations, together with some ideas on preconditioning, 
see Erikscnet al. (2004). 

2.3. Convergence issues in the low signal-to-noise 
regime 

As originally applied to high-resolution CMB data, the 
Gibbs sampling algorithm as described above has very 
slow convergence at the high-£, low signal-to-noise part 
of the spectrum. The reason for the slow convergence is 
easy to understand in light of the above: When sampling 
from P{Ci\s), the typical step size is given by cosmic 
variance at all angular scales. In the high signal-to-noise 
regime, cosmic variance dominates the noise variance, 
and we are able to explore the full width of the poste- 
rior in only a few Gibbs iterations. However, in the low 
signal-to-noise end, cosmic variance is far smaller than 
the posterior variance, and it takes a prohibitively long 
time to converge probabilistically. This problem of "slow 
mixing" of the Gibbs sampler is illustrated in figures [T] 
and [21 The long correlation length starting at signal-to- 
noise of unity leads to extremely long run times in order 
to produce a reasonable number of uncorrelated samples. 

3. A LOW SIGNAL-TO-NOISE MCMC SAMPLER 

When sampling from the true posterior, the goal is to 
produce as many independent samples from P{Ce, s|d) as 
possible. One might intuitively guess that it should be 
straightforward to establish good approximations to the 
posterior in the low signal-to-noise regime, since in the 
limit of vanishing signal to noise we simply recover the 
prior. This suggests that we look for a sampling scheme 
in which we first sample a new spectrum from some ap- 
proximation to the true posterior independent on the cur- 
rent spectrum and CMB map, followed by sampling the 
CMB map from the conditional P{s\Ce, d). The problem 
with such a direct scheme is that the accept probability 
will involve a ratio of determinants which are too expen- 
sive to compute. 

We are therefore motivated to look for a sampling 
scheme in which we can make a large variation in Ci 



in the low signal-to-noise regime, and make an associ- 
ated deterministic change in the CMB map, while still 
maintaining a reasonably high acceptance rate. The mo- 
tivation for a deterministic change is that it will avoid 
introducing ratios of determinants which we cannot com- 
pute. 

3.1. Proposal rule and acceptance probability 

Assume that we have defined a deterministic sampling 
scheme for s, and that our new CMB map is given by 
some function 






(14) 



Then the condition of detailed balance for our MCMC 
sampler requires that 

F-i(s„+i,c("+^\cl")) =^(s„+i,cf\cl"+^^ (15) 

or, in other words, that the inverse function is given by 
exchanging the order of the spectra in the function F. 
One simple function which has this property is 
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The total proposal matrix is then 
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and the "reverse" proposal is 
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The condition of detailed balance including deterministic 
moves requires the consideration of some technical points 
which we leave to Appendix [X] There we show that the 
full Metropolis-Hastings accept probability reads 



-X (sn+i.d; 



1, 
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(17) 



The significance of the above is that we can make rel- 
atively large changes to the power spectrum in the low 
signal-to-noise regime, where N~^ is getting small, since 
the x^ is affected only very mildly by changes in any low 
signal-to-noise mode. 

We note the interesting point (discussed more com- 
pletely in Appendix |B| that if one changes variables 

in the joint posterior from CMB maps, s, to whitened 

_\ 
maps, X = C^ ^s, and then Gibbs sample in the new 
variables (C£,x), the resulting accept probability is nu- 
merically identitical to the above. However, we note 
the distinction here to emphasize the difference between 
MCMC algorithms implementing deterministic proposals 
of maps given C^, and those sampling in a different set of 
variables, as there could be other deterministic proposal 
schemes or another change of variables which lead to im- 
provements over the approach presented in this paper. 



For the numerical demonstration of the MCMC algo- 
rithm presented in this paper, we use a simple symmetric 
Gaussian proposal, truncated at C^ > (or, for polar- 
ization, the region where the resulting CMB covariance 
matrix is positive definite), for the power spectrum, 



u;(c("+^)|ci"\ 



oc e 



I{Ci > 0), (18) 



where Tg is a measure of the typical step size taken be- 
tween two samples. Note that because this proposal den- 
sity is symmetric, the ratio of Ci proposals cancels, and 
the acceptance probability is entirely determined by the 
change in x^- 

It should be noted that while the above MCMC step 
satisfies detailed balance, it is not irreducable, in the 
sense that there is not a non-vanishing probability in 
reaching any state from any other state in a finite num- 
ber of MCMC steps; the phases are unchanged in each 
MCMC step. However, alternating these steps with a 
traditional Gibbs sampling step gives a combined "two- 
step" MCMC algorithm which indeed is irreducable, and 
therefore provably converges to the joint posterior. Once 
again, the details are left to the appendix for the inter- 
ested reader. 

3.2. Optimization of the MCMC sampler 

A general advantage of the Gibbs sampler is the fact 
that it is free of tunable efficiency parameters. The 
same is not true for the Metropolis-Hastings MCMC al- 
gorithm; for satisfactory sampling performance, it typi- 
cally has to be tuned quite extensively. In this section, 
we describe three specific features that helps in this task, 
namely 1) step size tuning, 2) slice sampling and 3) bin- 
ning. 

First, we have to ensure that the step size of our Gaus- 
sian proposal density roughly matches the width of the 
target distribution, in order to maintain both a reason- 
able acceptance rate and high mobility. We do this by 
performing an initial test run, producing typically a few 
hundreds Ce samples, and compute the standard devia- 
tion of these samples for each £. These are then adopted 
as the proposal widths for the main run, scaled by some 
number less than unity, typically between 0.05 and 0.5. 
For the initial test run, we approximate the posterior 
width by the noise variance alone, 



m 



21 + I hi 



(19) 



because the MCMC sampler is used only in the low 
signal-to-noise regime. In this expression Ni is the power 
spectrum of the instrumental noise alone, and hg is the 
product of the Legendre transform of the beam and the 
HEALPix window function. 

Next, Metropolis-Hastings MCMC is inefficient in 
spaces with too many free parameters. For this rea- 
son, we divide the power spectrum cocfhcients, C^, into 
subsets, each containing typically only 10-20 multipoles. 
Then we propose changes to one subset at a time, while 
keeping all other multipoles fixed. Finally, we loop over 
subsets, and thus effectively implement a multipole slice 
Gibbs sampler for the full power spectrum. 

This is computationally feasible, because a single 
MCMC proposal only requires a single x^ evaluation. 
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Fig. 1. — Comparison of Ci chains produced by standard Gibbs 
sampling (black) and by the Gibbs+MCMC hybrid (red) for three 
selected multipole bins. The simulation was based on full sky cov- 
erage and uniform noise. See text for full details. 

which has a computational cost of a single spherical har- 
monic transform. Since drawing a full sky map from 
P{s\Cg,d) in the classical Gibbs sampling step requires 
O(IO^) spherical harmonic transforms, we can indeed af- 
ford to perform many MCMC proposals for each Gibbs 
step, without dominating the total cost. 

Nevertheless, for very high-resolution analysis it is of- 
ten beneficial to bin several C^'s together, both in order 
to increase the signal-to-noise of the joint coefficient, and 
to decrease the number of parameters that needs to be 
sampled by MCMC. We implement this by defining a new 
binned spectrum, weighted by C.{li -\- l)/27r, as follows. 
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] denotes the current bin, and A^^ 



Here h 

^max — ^min -l- 1 is thc numbcr of multipoles within the 
bin. These new (and fewer) coefficients are then sampled 
with the above MCMC sampler, after which the original 
spectrum coefficients are given by 



C, = 
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(21) 



4. TESTING AND VALIDATION 



We have implemented the new sampling step described 
above in the previously Gibbs sampling code called 
"Commander" (jEriksen et all 12004 [2008al i. and in this 
section we demonstrate its advantages compared to the 
old sampling algorithm. We consider two different cases, 
namely high-i? temperature and low-£ polarization anal- 
ysis. In the former case, we also analyse two cases, with 
and without a sky cut. The former allows us to verify the 
results against an analytically known answer, while the 
second demonstrates that the sky cut does not degrade 
the sampling efficiency. 

4.1. Tem,perature analysis 

The high-£ temperature simulation is designed 
to mimic the 5-year WMAP temperature data 




40 60 

Distance in chain 

Fig. 2. — Comparison of chain correlation functions for standard 
Gibbs samphng (blue) and Gibbs+MCMC (red), computed from 
the full-sky uniform noise temperature data set.. Note that while 
the correlation length goes to infinity with increasing t. (or equiva- 
lently, low signal-to-noise) for standard Gibbs sampling, it is < 40 
everywhere for the MCMC hybrid case. 



(jHinshaw et al.l l2008f ) with one exception, namely that 
the noise is assumed spatially uniform, in order to fa- 
cilitate analytic comparison. Specifically, the CMB re- 
alization was drawn from the best-fit ACDM m odel de- 
rived from WMAP alone (jKomatsu et al.ll2008f) . includ- 
ing multipoles up to ^max = 1000, and then smoothed 
with the instrumental beam of the WMAP VI differ- 
encing assembly, and pixelized at HEALPix^° resolution 
A'side = 512. Finally, uniform noise of ctq = 40/iK RMS 
was added to each pixel. This corresponds to a signal- 
to-noise ratio of unity at £ ~ 550, roughly similar to the 
5-year WMAP data. We analyse this simu lation both 
with and without the WMAP KQ85 sky cut (jGold et all 
[200l . 

In both analyses, we adopted the Gaussian proposal 
density with tuned variances, as described above. We 
also bin the power spectrum in progressively wide bins, 
starting at £ = 600, to maintain a reasonable signal-to- 
noise per sampled power spectrum parameter. Ten bins 
were sampled jointly per proposal, while all others were 
kept fixed. 

In the full-sky case, we produced a total of 31,800 sam- 
ples over 60 chains, and in the cut sky case a total of 6800 
samples. The cost for producing one sample in the lat- 
ter, and by far most expensive, set was 2.5 CPU hours, 
for a total of 17000 CPU hours. The number of MCMC 
steps per Gibbs step was one in the former and 20 in 
the latter. (Since the the signal sampler dominates the 
cut sky Gibbs chain one can perform more low S/N steps 
without slowing down the overall code significantly.) In 
addition to these two main sample sets, we also produced 
two longer chains with each 3500 samples for the full- 
sky casee, both with and without the new MCMC step 
turned on, in order to compare the Markov chain cor- 
relation lengths before and after including the MCMC 
sampler. 

We first consider the full-sky data set, and in Figure 
[T] we show a segment of each of the two longer chains 
for three selected multipole bins. The top panel shows 
£ = 600, which is the first bin to be sampled by MCMC, 

^'•^ http://hcalpix.jpl.nasa.gov 
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Fig. 3. — Gelman-Rubin statistic for the full-sky, uniform noise 
temperature analysis. Note the feature at £ = 600, which marks the 
transition between standard Gibbs sampling and Gibbs-|-MCMC. 

the middle panel shows i = 732 — 742, where there is 
still some signal in the data, and, finally, the bottom 
panel shows I — 855 — 1000, which is strongly noise dom- 
inated. Starting with the top panel, we see that the red 
curve (Gibbs-|-MCMC) scatters significantly faster than 
the black curve (Gibbs only), implying more efficient 
sampling. This trend becomes even stronger with lower 
signal-to-noise, until the last case, where the Gibbs-only 
chain essentially does not move at all, while the MCMC 
sampler does probe the full range. Note, however, that 
even the MCMC sampler has a significant correlation 
length in this range, and this implies that there is still 
some room for improvement to be made in defining our 
proposals. 

Next, these considerations are quantified in Figure [H 
where we plot the Markov chain correlation length as 
a function of distance in the chain, for six bins with 
and without the MC MC sampler. As first reported by 
lEriksen et al.l ()2004f ). we see that the Gibbs-only cor- 
relation length increases dramatically with decreasing 
signal-to-noise, rendering the algorithm essentially use- 
less in this regime. However, we also see that the new 
MCMC step effectively resolves this issue, as the corre- 
lation length (here defined by having a correlation less 
than 0.2) now is less than ~ 40 steps. This is a dramatic 
improvement, and makes the algorithm useful even in 
this range. Nevertheless, we once again point out that it 
is possible to make further improvements by establishing 
better proposal densities. 

In Figure [3] we consider the convergence properties of 
the ~ 30k s amples set, by computin g the Gelman-Rubin 
statistic R (jGelman fc RubinI |1992[ ) as a function of L 
Typically, one recommends that R should be less than, 
say, 1.2 in order to claim convergence. We see that this 
holds everywhere for this sample set, and typically it is 
even less than 1.05. Note also the step at £ = 600, show- 
ing clearly the beneficial effect of the MCMC sampler. 

Next, in Figure 2] we compare the marginal distribu- 
tions derived from this sample set with the analytic re- 
sult. 
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Fig. 4. — High-£ temperature marginal posteriors computed with 
Gibbs+MCMC from the full-sky, uniform noise temperature data 
set, compared to analytic results. 

denotes the product of the instrumental beam and the 
HEALPix pixel window, and a^ is the power spectrum 
of the noisy data map. We see that the new algorithm 
reproduces the analytic distributions very well, and this 
verifies the overall method. 

Finally, the cut-sky power spectrum with one-sigma 
confidence regions is shown in three panels in Figure 
[H focusing on different ^-ranges, namely all £'s, the 
S/N ~ 1 transition region, and the low S/N region. This 
completes the high-£ temperature analysis validation. 

4.2. Polarization analysis 

We now consider polarization analysis, and construct 
a new \ow-i simulation for this purpose. This simulation 
does not mimic any planned experiment, but is rather 
designed to highlight the analysis method itself. Specif- 
ically, we drew a new CMB realization from the best-fit 
WMAP ACDM spectrum that includes a non-zero tensor 
contribution, including multipoles up to fmax = 150, and 
convolved this with a 3° FWHM Gaussian beam, and 
pixelized it at A'sidc = 64. Uniform noise of 5fiK RMS 
was added to the temperature component, and 1/iK RMS 
to the polarization components. The 5-year WMAP po- 
larization sky mask was imposed on the data. 

We allowed for non-zero Cj^, Cj^, Cf'^ and Cf^ 
spectra, but fixed Cf^ ~ df^ — 0. These spectra were 
then individually binned to maintain a reasonable signal- 
to-noise per bin. (Details on how to introduce individual 
binning of each power spectrum were recently described 
by Eriksen and Wehus, 2008.) Again, a tuned Gaussian 
proposal density was used in the MCMC step. A total 
of 12 000 samples were produced over 12 chains, and the 
CPU time per sample was 55 seconds, for a total of ^ 200 
CPU hours. 

In Figure [6] we show one Ci chain for each of the four 
sampled spectra, for the last (and therefore most diffi- 
cult) bin in each case. Note that the Cf^ and Cf^ spec- 
tra have essentially vanishing signal-to-noise, and there- 
fore these chains reach zero values. Clearly, we see that 
mixing properties of these chains are satisfactory, and 
the correlation lengths are quite short. 

In Figure [7] we show the Gclman- Rubin statistics for 
each of the four power spectra, and with the single ex- 
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Fig. 5. — Temperature power spectrum estimated from cut sky 
temperature data. The panels show the same spectrum, but em- 
phasizing different multipole ranges (full-range; S/N^^l transition 
region; and high-f, low S/N). 

ception of the very last bin of Cf^ , all R values are well 
below 1.1. Thus, all spectra have converged well every- 
where. 

Finally, in Figure [8] we show the reconstructed 
marginal power spectra for each polarization component, 
overplotted on the input spectrum. The agreement is 
very good. Note, however, that these spectra are di- 
rect marginals, and not a joint maximum likelihood es- 
timate. They are therefore not individual unbiased es- 
timators. In particular, the marginal Cf^ power spec- 
trum is biased slightly high because of the combination 
of the Cj'^Cf^ - {Cj^f > positivity constraint and 
relatively low signal-to-noise. Consideration of the joint 
polarization posterior, which is an unbiased estimator, is 
postponed to a future publication. 

5. CONCLUSIONS 

We have presented a new MCMC algorithm for the 
high-L, low signal to noise limit of the joint posterior 
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Fig. 6. — Ci chains generated by Gibbs+MCMC hybrid for the 
cut-sky polarization data set. Only the highest multipolc bin for 
each spectrum is shown (£ = 108 - 150 for TT, £ = 88 - 150 for 
TE, .f = 101 - 150 for EE and £ = 61 - 150 for BB). 
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Fig. 7. — Gelman- Rubin statistic for cut-sky polarization analy- 
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Fig. 8. — Marginal C^ power spectra (red curves) estimated from 
cut sky polarization data. Gray bands indicate 68% confidence re- 
gions, and the black lines show the input spectrum. (Note that the 
marginal spectra shown here are not individually unbiased estima- 
tors because of the correlations between TT, TE and EE. Proper 
treatment of the full joint polarization density will be considered 
separately in a future publication.) 



which solves the slow probabilistic convergence of the 
traditional Gibbs sampler in this regime. This in prin- 
ciple allows sampling over the joint posterior p{Ci,s\A) 
over the entire range of angular scales probed by current 
and future CMB experiments. The limiting computa- 
tional burden is now entirely in the map-making step of 
Gibbs sampling, for which the cost per Gibbs iteration 
now scales with the expense of multiplication by the in- 
verse noise matrix N^^. Assuming pixel uncorrelated 
(but scan weighted) noise as a good approximation at 
small angular scales, the cost of an N~^ multiplcation is 
that of a forward and inverse spherical harmonic trans- 
form, or 0{(,^^^). Future work will attempt to push the 
generalized Gibbs + MCMC sampling scheme presented 
here to smaller angular scales, ultimately limited by the 
degree to which we can compute harmonic transforms. 
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APPENDIX 
INCLUDING DETERMINISTIC PROPOSALS IN MCMC 

Here we review the derivation of the accept probability in Markov Chain Monte Carlo when using deterministic 
proposals (or proposals where some of the degrees of freedom are specified as deterministic functions of the past state 
and/or proposed variations in some other degrees of freedom) . We first briefly review the Metropolis-Hastings Markov 
Chain Monte Carlo algorithm and the proof of its convergence, and th en turn to th e special case involving deterministic 
proposals. Much of the review of the MCMC algorithm here follows (jSokal Ill989f). We also note that similar technical 
considerations including deterministic elements in proposals are presented in ( Green |[l9950 in the context of MCMC 
algorithms in which the dimension of the state space itself is included as a random variable to be sampled over. 

The goal is the construction of a transition matrix T(Ci,s|C;,s',d) such that after initializing the Markov Chain 
with a sample from any probability density po{Ci, s|d), we generate samples from a sequence of probability densities 

Pn+i{Cus\A) = j d{C[,s') T(Q,s|C/,s',d)p„(Q',s'|d) (Al) 

which eventually converge to an equilibrium density n{Ci, s|d) 

7r(Q,s|d)= lim Pn{Ci,s\d) (A2) 

n — >oo 

We remind the reader of the sufficient conditions to establish convergence of an MCMC algorithm: stationarity, which 
means that the MCMC transition matrix satisfies 

TT{Ci,s\d) ^ J d{C'i,s') T{Ci,s\Ci,s',d) 7T{Ci,s'\d) (A3) 

and irreducability, which means that for any two states, there is a finite number of iterations which give a non- vanishing 
probability to transition from one state to the other. It is well known that these two properties are sufficient to establish 
convergence, as can be seen simply from the triangle inequality 

Jd{Ci,s) \7r{Q,s\d) - pn{Ci, s\d)\^ J d{Ci,s) y d(Q', s') r(G,s|q',s') (7r(Q',s'|d) - Pn-i(Q',s'|d)) 

<Jd{Ci,s) Jd{C'i,s') r(G,s|Q',s')|7r(Q',s'|d)-p„_i(Q',s'|d)| 

= j d{C[,s') (Jd{Ci,s) T{Ci,s\Cl,s')) |7r(Q',s'|d) -p„_i(Q',s'|d)| 

= J dices') |^(Q',s'|d)-p„_i(Q',s'|d)| 

The Metropolis-Hastings Markov Chain Monte Carlo algorithm is one method of constructing such a transition 
matrix. We choose any proposal matrix w{Ci, s|C;', s', d) and then accept the proposed move with a probability 

0<A{Ci,s\C"i,s\d)<l (A4) 

while rejecting the proposed move with probability 1 ~ A leads to a "null transition" where the next state in the 
Markov Chain remains the same. Application of this algorithm then leads to the sequence of probability densities 
which satisfy 

p,,+i{Ci, s\d)=(l- I d{Ci,s') A{C'i,s'\Ci,s,d)w{C'i,s'\Ci,s,d))pn{Ci,s\d) 

+ J dices') A{Ci,s\Ci,s',d)w{Ci,s\C'i,s\d)pn{Ci,s'\d) (A5) 

where the first term is the constribution to the probability density Pn+i if we reject any proposed move, while the 
second term is the contribution from accepting the proposed move from any possible previous state. If we demand 
that, for a chosen proposal matrix, the accept probability satisfies 

Tr(C'i,s'\d)w(Ci,s\Cis\d)A{Ci,s\Cis',d)^A{C'i,s'\Ci,s,d)w{C'i,s'\Ci,s,d)7r{Ci,s\d) (A6) 



then we see that the MH MCMC algorithm satisfies stationarity, i.e. denoting hy T on the density resulting from one 
application of the transition matrix to tt, we have directly from detailed balance 

To7r=(l- J d{Ci,s') A{Cl,s'\Ci,s,d)w{C'i,s'\Ci,s,d)]TT{Ci,s\d) 
+ J d{C'i, s') A{Ci,s\Ci s', d)w{Ci,s\C'i, s', dMC'i, s'|d) 
= (1- I d{Ci,s') A{C'i,s'\Ci,s,d)w{C'i,s'\Ci,s,d)]TT{Ci,s\d) 

+Tr{Ci,s\d) fdiC'i,s') A{Cis'\Ci,s,d)w{C'i,s'\Ci,s,d) 

-7r(Q,s|d) (A7) 

We now turn to the case where our proposal is of the form 

wis', Q'|s, Q) = S[s'- F{s, C[, Ci)] w{C[\Cud) (A8) 

where we randomly propose a new power spectrum, posibly in a manner conditionally denpendcnt on the current 
spectrum and the data, and then deterministically compute a new CMB map with some function 

s'==F(s,q',co 

To satisfy detailed balance with a non-vanishing accept probability our function must satisfy 

s' = F(s,Q',G) 
s = F(s',G,Q') 
or, that the inverse function is equivalent to interchanging the order of the power spectrum arguemcnts 

F(s',a,Q') = ^"'(s',Q',CO 
In this paper, we have chosen one such function, given by 

F(s,Q',C/) = [C']i/2[C]-i/2g 

where interchanging the spectra in the function above does in fact give the inverse function itself. 

Our job now is to derive the accept probability such that we satisfy stationarity (as discussed above). For the 
proposal with deterministic changes to some of the degrees of freedom, stationarity is satisfied if 



(A9) 

(AlO) 
(All) 
(A12) 



(ro^)(G,s|d) 



1 - j d(C[', s") A[C[', s"\Ci,sW - F(s, C'l', Ci)]w(C['\Ci,d) ^(G, s|d) 
+ j d{C[, s') A[s, Ci |s', C[]5[s - F(s', CuC[)]w{Ci\C[, d)^(Q', s'|d) 



In order to determine the integral over the (5-function in the accept term above, we recall the identity for (5[G(x)], 
where G(a) = 0, 

'^[G(x)] = ^^ (A13) 



In our case, we can identify 



\dG/d^l 
G{s')^s-F{s',Ci,C'i) 



which vanishes at F ^(s, C;, C^) — F{s, C[, Ci). We also have the Jacobian 



dG 



ds' 



dF 



ds' 



s' = F-i(s,C,,C,') 

(i.e. G(s') is considered a function of s' with the other CMB map s considered fixed) which therefore gives 

-1 



6[s - F(s', Ci,Cl)] - ,5[s' - F-i(s, Q, Q' 
Inserting this into the condition for stationarity we have 



s'=F-^s,Ci.,Cl) 

isidc 
OF 



ds' 



5' = F-i(s,C,,C,') 



(A14) 
(A15) 

(A16) 



(To^)(G,s|d) 



1 - / d{Cl',s") A[Cl',s"\Gi,s]S[s" - F{s,C'/,Ci)]w{Cl'\Gi,d) 



+ / d(Q',s') A[s,Gi\s',Gi] S[s'-F-\s,Ci,Cl)] 



dF 



ds' 



7r{Gi,s\d) 

)u;(Q|Q',d)7r(Q',s'|d) 



s'=F-i(s,Ci,C;), 



1 - / rf(Q", s") A[G'/, s"\Gi,s]S[s" - F{s, Q", Ci)]w{C'/\Gi,d) 



+ Jd{Gis') A[s,Gi\s',G;] Us'-F(s,Q',GO] 



dF 



ds' 



ACi,s\d) 

]wiCi\Cid)7:{Gl,s'\d) 



i'=F-i(s,c,,c;)y 
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where in the second Hne we again used the property that the inverse _F ^ is equivalent to F with the spectra arguements 
interchanged. We see from the above that a sufficient condition for stationarity is 



7r{Ci,s\d)w{Ci\Ci,d)A[s', Cl\s, Q] = A[s, G|s', Q'] 



dF 



ds' 



w{Q\Cl,d)niCis'\d) 



(A17) 



^F-^s,Ci,c:). 



An accept probabihty which satisfies this condition therefore gives cancehation of the integrals over the (5-functions for 
both the reject and accept contributions, leaving us exactly with To tt = tt. We therefore have the accept probability 



A[s',Ci\s,Ci] =min 



7r{Cl,s'\d)w{Ci\Cid) 
' n{Ci,s\d) w{Ci\Ci,d) 



dF 



ds' 



--F-^{sfii,C[), 



(A18) 



We give the expression above for the general case of any deterministic change in the CMB map with a function which 
satisfies F(s, C;, C[) = F^^(s, C[, Ci). We now explicitly evaluate this accept probability for the functional form chosen 
for this paper. 
Since we have F(s', Ci, C[) = [C]i/2[c']-i/2s'^ we have 



dF 



|C|l/2 



(A19) 



Reminding the reader of the functional form of the joint posterior in eqn. [71 we have the accept probability given by 

^1 ' 



A[s',C;|s,C;]=min 



T:{C[,s'\d)w{Ci\C[,A) 
' 7r(G,s|d) w{C[\Ci,d) 



dF 



ds' 



s'=_F-i(s,C,.q) 



e -X (^ ■'1' e 



)g-s'[C'] Is' |c|l/2 uj{Cl\C'i,d) ( dF 



g-x'(s'.d) g- 



gsC-ig 

-s'fC'1-is' 



\C'\y^w{C'i\Ci,d) 

|c|V2^Q|c;,d) 



"^' s'=_F-i(s,C,,C,')^ 

C'|V 



g-X^(s,d) gsC-is |C'|l/2w(q|Q,d) 

e-x'is',d)^^fji\C',,d) ' 
g-x^(s,d) w{q\Ci,d) 



|C|i/2 



(A20) 



where the last line follows from the invariance of the quadratic form under the functional mapping s'[C'] ^s' = sC ^s. 
Finally, we note that for the special case of a symmmetric proposal matrix where w(C;'|C/, d) = ■u;(C; |C;', d), the accept 
probability is completely determined by the (exponeniated) change in x^ 



A\s',Ci\s,Ci] 



1, 



^x'(s'.d) 
-X^(s,d) 



(A21) 



As emphasized earlier in the main part of the text, the above allows large changes to the spectrum precisely where the 
signal to noise is getting small, as x^ docs not change much in this regime. 

RELATION TO GIBBS SAMPLING IN A CHANGE OF VARIABLES 

We note here another interesting approach to an MCMC algorithm in a different set of variables which in fact allows 
for large moves in the spectrum in the low signal to noise regime. We define the CMB map 



We therefore have the joint posterior in the new variables according to 

9s 



p{Cus\d)d{Cus)^p{CiMd) 



dx 



d{Cu^) 



which is explicitly, up to a normalization constant 

- 2 logp(C;, x|d) = (d - Ci/2x)N-i(d - C^/^x) - ||xf 

Then traditional Gibbs sampling in the new variables leads to an accept probability when changing the spectrum given 
the change of variable map x as 



(Bl) 
(B2) 
(B3) 



A(C;',x|C;,x) =min 



1, 



g-(d-[C']VMN-^(d-[C']VM^;(Q|x,C;,d) 
g-(d-CV2x)N-i(d-Ci/2x) w(q|x,G,d) 



(B4) 



where in the above the proposed variation in the spectrum can now be conditionally dependent on the current change of 
variable map x. Assuming a symmetric proposal, or one conditionally independent of x leads to an accept probability 
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which is numerically the same as \A20\ , and also has the same property - large moves in the spectrum are possible 



1 /2 

in the low signal to noise regime. As a side note, we can see that logp(Ci|x, d) is quadratic in C; , and suggests 



1/2 

a proposal given by a Gaussian in C, 



1/2 

However there are two problems with this scheme - sampling in C^ will 



result in re-introducing a Jacobian factor given by the ratio of \C'\^^^/\C\^^-^ which results typically in low acceptance 
probabilities, and furthermore we cannot afford to exactly compute the local "Fisher" covariance matrix for each x. 
Because of these difficulties, we in general need to produce a proposal for Ci and then compute the accept probability 
above. 

We emphasize an important distinction between MCMC with deterministic steps in the original variables (Ci, s) and 
Gibbs sampling in the change of variables (C; , x) . It is only for the specific functional form that we have chosen for 
this paper that the numerical value of the accept probabilities for A(C; , s'|C;, s) and A(C; , x|C;, x) are the same. 

At first glance, it might appear that a random variation in some of the variables followed by a deterministic change in 
the complementary set is always equivalent to random variation in a new set of variables. For notational convenience, 
we will assume the state space is separated into two sets of variables (x, y), i.e. for the CMB sampling context we have 
{s,Ci). Now, to make the distinction between a change of variables and deterministic steps in MCMC more precise, 
consider a "global" change of variables of the form 

u = F(x,y) 

v = y 

with Jacobian 

^ 



5u dv 



dF -1 
9y ^ 



OF 



dy dy 

A Gibbs sampling step varying v with u fixed, has accept probability 

7r(y„+i|u„,d) w;(y„|u„,d) 



dx 



(B5) 
(B6) 



A(y„+i , u„ |y„ , u„) = min 



1,- 



7r(y„|u„,d) w(y„+i|u„,d) 



1, 



7r(y„+i,x„+i|d) 
7r(y„,x„|d) 



dF 



dy 



, + i,y„+i 



OF 



w(yn|u„,d) 
w(y„+i|u„,d) 



where in the above we have the constraint 

x„+i=i^"^(u„,y„+i) 
x„ = i^"\u„,y„) 
Now consider an MCMC step in the original variables of the form 

w(x„+i,y„+i|x„,y„) = t(;(y„+i|y„,x„,d)(5(a;„+i - iJ(a;„, y„+i, ?;„)) 
with general accept probability, according to the discussion above 

7r(y„+i,x„+i|d) it;(y„|x„,y„+i,d) 



(B7) 

(B8) 
(B9) 



A(y„+i, x„+i|y„, x„) =min 



1, 



dH 



9x 



x„+i=_ff nx,i,yn,yn+i)y 



(BIO) 
(Bll) 



7r(y„,x„|d) w(y„+i|x„+i,y„,d) \ 

Interestingly enough this suggests that we can set H to be the function 

7?(x, y„+i, y„) = F-^ (F(x, y„), y„+i) 

Does this function have the correct properties for its inverse? Assuming we have computed in the forward direction 
x' = iJ(x, y„_|_i, y„), we can invert to find x by computing sequentially 

i^(x',y„+i)=F(x,y„) 

x = f-i(F(x',y„+i),y„) 

= iJ(x',y„,y„+i) (B12) 

where the last line follows from definition of the forward H. Since we have, by definition 

x' = i/(x,y„+i,y„) 

x = i/"^(x',y„+i,y„) 

we therefore have shown that 

-ff~\x',y„+i,y„) = iJ(x',y„,y„+i) (B13) 

as required for a non- vanishing accept probability. The above as a function of x has Jacobian 



dH 




dF- 


-1 








dF 


5x 




du 


(u(x,y„),y„+i) 


ax 




dF 


-1 


dF 








5x 


(X 


yr.+ l) 


9x 


(x,3 


^„) 



(x,y„) 



(B14) 
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However, when evaluated at x„4-i = i/^^(x„,y„,y„_|_i), we will not in general satisfy the required equality required 
for numerical equivalence 



dH 



9x 



^ 



x„ + i=/f-i(x„,y„,y„+i)^ 



OF 



dx 



x„+i,y„+i 



OF 



dx 



(B15) 



x„,y„^ 



So in general, while we can use any function F(x, y) to generate deterministic moves in the original variables within 
MCMC, this is not equivalent to a Gibbs sampling step p(y„+i|u„,d) in the new variables using (F(x, y),y) as a 
global change of variables. 
However, using the above construction for the CMB change of variables, we have explcitly 



F-'iF{s,Q),C'i)^[Cf/'{c-'/'s) 



(B16) 



which is exactly the functional form used for the deterministic MCMC steps. In this case, it is because the Jacobian 
of our deterministic change in the CMB map is independent of the current CMB map s (and only dependent on the 
proposed and current spectra) that we have numerical equivalence of the accept probabilities. 

So in summary, while we can use any mapping i^(x, y) to generate deterministic steps for use in MCMC, the accept 
probability is not equivalent to a conditional step p{y\u, d) using i^(x, y) in a change of variables due to the general 
"location" dependence of the Jacobian. Furthermore, setting iJ(x, y',y) = F^^(F(x, y),y') is not the most general 
form for a function that satisfies the detailed balance requirement iJ^^(x, y',y) — iJ(x, y,y'). In this sense then, a 
change of variables as an approach to more efficiently generating samples from a probability density is distinct from a 
strategy of designing an MCMC algorithm (in any chosen representation of the variables) with deterministic changes 
of some of the degrees of freedom. Both approaches are interesting, and advances in either approach for Bayesian 
CMB analysis could lead to improvements over the approach presented in this paper. 
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